bug fix
Do AI models help produce verified bug fixes?
Huang, Li, Mustafin, Ilgiz, Piccioni, Marco, Schena, Alessandro, Weber, Reto, Meyer, Bertrand
Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills? To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the Goal-Query-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs. These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a fine-grain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
A Comprehensive Study of Bug-Fix Patterns in Autonomous Driving Systems
Chen, Yuntianyi, Huai, Yuqi, He, Yirui, Li, Shilong, Hong, Changnam, Chen, Qi Alfred, Garcia, Joshua
As autonomous driving systems (ADSes) become increasingly complex and integral to daily life, the importance of understanding the nature and mitigation of software bugs in these systems has grown correspondingly. Addressing the challenges of software maintenance in autonomous driving systems (e.g., handling real-time system decisions and ensuring safety-critical reliability) is crucial due to the unique combination of real-time decision-making requirements and the high stakes of operational failures in ADSes. The potential of automated tools in this domain is promising, yet there remains a gap in our comprehension of the challenges faced and the strategies employed during manual debugging and repair of such systems. In this paper, we present an empirical study that investigates bug-fix patterns in ADSes, with the aim of improving reliability and safety. We have analyzed the commit histories and bug reports of two major autonomous driving projects, Apollo and Autoware, from 1,331 bug fixes with the study of bug symptoms, root causes, and bug-fix patterns. Our study reveals several dominant bug-fix patterns, including those related to path planning, data flow, and configuration management. Additionally, we find that the frequency distribution of bug-fix patterns varies significantly depending on their nature and types and that certain categories of bugs are recurrent and more challenging to exterminate. Based on our findings, we propose a hierarchy of ADS bugs and two taxonomies of 15 syntactic bug-fix patterns and 27 semantic bug-fix patterns that offer guidance for bug identification and resolution. We also contribute a benchmark of 1,331 ADS bug-fix instances.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (20 more...)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
Jahanshahi, Mahmoud, Mockus, Audris
A critical part of creating code suggestion systems is the pre-training of Large Language Models on vast amounts of source code and natural language text, often of questionable origin or quality. This may contribute to the presence of bugs and vulnerabilities in code generated by LLMs. While efforts to identify bugs at or after code generation exist, it is preferable to pre-train or fine-tune LLMs on curated, high-quality, and compliant datasets. The need for vast amounts of training data necessitates that such curation be automated, minimizing human intervention. We propose an automated source code autocuration technique that leverages the complete version history of open-source software projects to improve the quality of training data. This approach leverages the version history of all OSS projects to identify training data samples that have been modified or have undergone changes in at least one OSS project, and pinpoint a subset of samples that include fixes for bugs or vulnerabilities. We evaluate this method using The Stack v2 dataset, and find that 17% of the code versions in the dataset have newer versions, with 17% of those representing bug fixes, including 2.36% addressing known CVEs. The deduplicated version of Stack v2 still includes blobs vulnerable to 6,947 known CVEs. Furthermore, 58% of the blobs in the dataset were never modified after creation, suggesting they likely represent software with minimal or no use. Misidentified blob origins present an additional challenge, as they lead to the inclusion of non-permissively licensed code, raising serious compliance concerns. By addressing these issues, the training of new models can avoid perpetuating buggy code patterns or license violations. We expect our results to inspire process improvements for automated data curation, with the potential to enhance the reliability of outputs generated by AI tools.
Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging
Kargupta, Priyanka, Agarwal, Ishika, Hakkani-Tur, Dilek, Han, Jiawei
Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving. The conversational capabilities of large language models (LLMs) show great potential for providing scalable, real-time student guidance. However, current LLMs often give away solutions directly, making them ineffective instructors. We tackle this issue in the code debugging domain with TreeInstruct, an Instructor agent guided by a novel state space-based planning algorithm. TreeInstruct asks probing questions to help students independently identify and resolve errors. It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state, effectively addressing both independent and dependent mistakes concurrently in a multi-turn interaction setting. In addition to using an existing single-bug debugging benchmark, we construct a more challenging multi-bug dataset of 150 coding problems, incorrect solutions, and bug fixes -- all carefully constructed and annotated by experts. Extensive evaluation shows TreeInstruct's state-of-the-art performance on both datasets, proving it to be a more effective instructor than baselines. Furthermore, a real-world case study with five students of varying skill levels further demonstrates TreeInstruct's ability to guide students to debug their code efficiently with minimal turns and highly Socratic questioning.
- Education > Educational Setting (1.00)
- Education > Curriculum > Subject-Specific Education (0.67)
iOS 16.4 is out with bug fixes and a ton of new emoji
If you own an Apple device, check your notifications: Apple has officially released updates for each of its major Platforms. The MacOS 13.3, and iOS / iPadOS 16.4 and WatchOS 9.4 updates include 21 new emoji, improved voice isolation for calls and a smattering of minor bug fixes. To start, the emoji keyboard is five new animals, two new hand gestures, three new colored hearts and a handful of household objects such as a folding fan, a flute and maracas. When you're not spamming friends with the new goose emoji, you'll be enjoying the benefits of the more subtle updates. Cellular calls now have Voice Isolation, designed to block out ambient noise.
Understanding Bugs in Multi-Language Deep Learning Frameworks
Li, Zengyang, Wang, Sicheng, Wang, Wenshuo, Liang, Peng, Mo, Ran, Li, Bing
Deep learning frameworks (DLFs) have been playing an increasingly important role in this intelligence age since they act as a basic infrastructure for an increasingly wide range of AIbased applications. Meanwhile, as multi-programming-language (MPL) software systems, DLFs are inevitably suffering from bugs caused by the use of multiple programming languages (PLs). Hence, it is of paramount significance to understand the bugs (especially the bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a foundation for preventing, detecting, and resolving bugs in the development of DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12 types (e.g., algorithm design bugs and memory bugs) according to their bug labels and characteristics. Second, we further explored the impacts of different bug types on the development of DLFs, and found that deployment bugs and memory bugs negatively impact the development of DLFs in different aspects the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet, PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs. Finally, the code change complexity of MPL bug fixes is significantly greater than that of single-programming-language (SPL) bug fixes in all the three DLFs, while in PyTorch MPL bug fixes have longer open time and greater communication complexity than SPL bug fixes. These results provide insights for bug management in DLFs.
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
Artificial intelligence platform helps developers spend less time on unit tests, bug fixes, documentation
Developers spend around half of their time on tasks other than coding, and French startup Ponicode is looking to change that with a little help from artificial intelligence. The company, which was founded in June 2019, has created a platform embedded with AI that learns from millions of lines of code. The Ponicode platform allows developers to write unit tests directly in the editor to help decrease the number of bugs at the production stage, according to Ponicode CEO and co-founder Patrick Joubert. Coding has not changed in the past 30 years and while developers might have more libraries and tools at their disposal, "In the end, the quality of our code still only depends on our personal judgment and experience," Joubert said. "Our goal is to help developers focus on what they like to do most: coding," he said.
- North America > United States (0.06)
- Europe > France (0.06)
Google announces TensorFlow Enterprise for large-scale machine learning - SiliconANGLE
Google LLC today launched an enterprise version of TensorFlow, the popular open-source artificial intelligence framework it created to run machine learning, deep learning and other statistical and predictive analytics workloads. Common use cases include training algorithms for image recognition and recurrent neural networks, as well as sequence-to-sequence models for machine translation and natural language processing. In a launch at the O'Reilly TensorFlow World conference in Santa Clara, California, Craig Wiley (pictured), director of product management at Google Cloud AI Platform, said the launch of TensorFlow Enterprise was necessary to meet the "higher demands and expectations" of enterprises that need to scale up their machine learning projects. TensorFlow Enterprise customers will be able to take advantage of what Google says is enterprise-grade support, including long-term support for older versions of the framework. Although TensorFlow is updated regularly, not everyone is able to upgrade to the newest releases immediately.
TensorFlow Enterprise Announced; What Does It Mean For Google Cloud
Enterprises of the previous decade have transformed from transactional to digital. Today, digital enterprises use machine learning pipelines with humans-in-loop. However, the enterprises of tomorrow will be aiming for end-to-end AI-driven core business solutions, or intelligent enterprises. To address these demands, Google this week announced TensorFlow Enterprise at the ongoing TensorFlow World conference. TensorFlow, one of the most popular machine learning frameworks, was open sourced by Google in 2015.
Google launches TensorBoard.dev and TensorFlow Enterprise
Google today announced the preview launch of TensorBoard.dev "You'll now be able to host and track your ML experiments and share them publicly, no setup required. Simply upload your logs and share the URL so that others can see the experiments and what you're doing with TensorBoard," Google VP of engineering Megan Kacholia said onstage today at TensorFlow World in Santa Clara, California. TensorFlow Enterprise is made to deliver an optimized version of its open source machine learning framework TensorFlow for large businesses. It works with Google's AI Platform and Kubernetes Engine as well as optimized versions of Deep Learning VMs and Deep Learning Containers. The service is made to supply up to 3x improvements in data reading -- the result of changes to how TensorFlow reads and caches files -- and up to 3 years of support for security patches and select bug fixes.